The Complexity of Mean Payoff Games

نویسندگان

  • Uri Zwick
  • Mike Paterson
چکیده

1 I n t r o d u c t i o n Let G = (V, E) be a finite directed graph in which each vertex has at least one edge going out of it. Let w : E , { W , . . . , 0 , . . . , W} be a function that assigns an integral weight to each edge of G. Ehrenfeucht and Mycielski [EM79] studied the following infinite two-person game played on such a graph. The game starts at a vertex a0 E V. The first player chooses an edge el = (a0, a l ) E E. The second player then chooses an edge e2 (al ,a2) E E, and so on indefinitely. The first player wants to maximise lira infn-.oo 1 n ~-]i=l w(ei). The second player wants to minimise lim supn_~o o ~ ~i"--1 w(ei). Ehrenfeucht and Mycielski show that each such game has a value v such that the first player has a strategy that ensures tha t liminfn_.oo ~ ~-~i"--1 w(ei) > v, while the second player has 1 n a strategy that ensures that lira supn_..o o ~ ~-~/=1 w(ei) <_ v. Furthermore, they show that both players can achieve this value using a positional strategy, i.e., a strategy in which the next move depends only on the vertex from which the player is to move. Without loss of generality, we may assume that the graph G = (V, E) on which such a game is played is bipartite, with V1 and V2 being the parti t ion of the vertices into the two 'sides' and with E = E1 t9 E2 such that E1 C_ V1 x V2 and E2 C_ V2 x V1. If the original graph is not bipartite, we simply duplicate the set of vertices. To obtain their results for the infinite game, Ehrenfeucht and Mycielski [EM79] also consider the following finite version of the game. Again the game starts at a specific vertex of the graph, which is assumed to be bipartite. The * Supported in part by the ESPRIT Basic Research Action Programme of the EC under contract No. 7141 (project ALCOM II). E-mail addresses of authors: z w i c k O m a t h , t a u . a c . i l and M i k e . P a t e r s o n @ d e s . w a r w i c k , a c . u k . players alternate in choosing successive edges that form a path, but the game ends as soon as a cycle is formed. The outcome of the game is then the mean weight of the edges on this cycle. The first player wants to maximise and the second player to minimise this outcome. This game is a finite perfect-information two-person game and so, by definition, has a value. Ehrenfeucht and Mycielski [EM79] show that the value v of this finite game is also the value of the infinite game described above. Furthermore, they show, surprisingly perhaps, that both players have positional optimal strategies for the finite game. The positional optimal strategies of the finite game are also positional optimal strategies for the infinite game. Ehrenfeucht and Mycielski [EM79] give no efficient algorithm for finding optimal strategies for the finite and infinite games. We complement their work by exhibiting an O([Y[ 3. I E[W) time algorithm for finding the values of the games played on a graph G = (V, E). The algorithm finds the values of all the vertices of the graph. Games starting at different vertices may have different values, of course. We also give an O([Y[ 4. [Ei.log(iEI/]Y D. W) time algorithm for finding positional optimal strategies for both players. Our algorithm is polynomial in the size of the graph but only pseudo-polynomial in the weights. Our algorithm is polynomial if the weights are presented in unary notation. In particular, our algorithms work in polynomial time if the weights are taken from, say, {-1, 0, +1}. This is already a non-trivial case. We also consider situations in which one player knows in advance the positional strategy to be used by the other player. Using a result of Karp [Kar78] we show that an optimal counter-strategy can be found in (strongly) polynomial time. This immediately implies that the decision problem associated with the game is in NP n co-NP. The decision problem corresponding to mean payoff games (MPG's) is thus in NP n co-NP as well as in P (pseudo-polynomial time), but is not yet known to be in P. This gives the MPG problem a rare status shared only by a few number-theoretic problems, such as primality [Pra75]. Mean payoff games have been considered independently by Gurvich, Karzanov and Khachiyan [GKK88]. They were not aware of the results of Ehrenfeucht and Mycielski and gave an alternative proof of the fact that both players in mean payoff games, or cyclic games as they call them, have positional optimal strategies. Gurvich et ai. give an algorithm for finding such optimal strategies, but the worst-case complexity of their algorithm is exponential. Further generalisations and variants of mean payoff games have also been considered by Karzanov and Lebedev [KL93], who also point out that the decision problem corresponding to mean payoff games is in NP O co-NP. Condon [Con92] has recently studied the complexity of simple stochastic games (SSG's) introduced originally by Shapley [Sha53]. Condon shows that the decision problem corresponding to SSG's is also in NP O co-NP. While MPG's are deterministic, SSG's are games of chance. We describe a simple reduction from MPG's to SSG's in two steps. We first describe a reduction from MPG's to discounted payoff games (DPG's), and then a reduction from DPG's to SSG's. The reduction from MPG's to SSG's shows that SSG's are at least as hard as MPG's. We believe that the MPG problem is strictly easier then the SSG problem. As a t tempts to obtain polynomial t ime algorithms for SSG's have not yet borne fruit, it may be interesting to focus at tention on the possibly easier problem of obtaining a polynomial time algorithm for MPG's. 2 F i n d i n g t h e v a l u e s o f a g a m e Let G = (V,E) be a graph and let w : E -* { W , . . . , 0 , . . . , W } be a weight function on its edges. Let IV[ : n. We assume that the graph is bipartite with V1 being the set of vertices from which player I is to play, and V2 being the set of edges from which player II is to play. Our first goal is to find, for each vertex a E V, the value v(a) of the finite and infinite games that start at a. Recall that the values of the finite and infinite games are equal. If a E V1 then player I (the maximiser) is to play first and if a E V2 then the second player (the minimiser) is to play first. To reach this goal we consider a third version of the game. This t ime the two players play the game for exactly k steps constructing a path of length k, and the weight of this path is the outcome of the game. The length of the game is known is advance to both players. We let v~(a) be the value of this game started at a E V, where player I or II plays first according to whether a E V1 or a E V2. T h e o r e m l . The values t,k(a), for alia E V, can be computed in O(k.lEI) time. Proof. The result follows easily from the following recursive relation (max(a,b)eE{w(a, b) + t,~_l(b)} i r a E V1, vk(a) = min(a,b)~E{w(a,b) + ~k-l(b)) i f a E 112, along with the initial condition, vo(a) = 0 for every a E V. [] It seems intuitively clear that l imk-.~ vk(a)/k = t,(a), where v(a) is the value of the infinite game that starts at a. The next theorem shows that this is indeed the case. In the proof of this theorem we rely on the result, proved by Ehrenfeucht and Mycielski, that both players have positional optimal strategies. A positional strategy for player I is just a mapping 71" 1 -" V1 --+ V2 such that (al, 7rl(al)) E E l for every al E V1. Similarly, a positional strategy for player H is a mapping Ir2 : V2 --+ V1 such that (a2, r2(a2)) E E2 for every a2 E V2. T h e o r e m 2 . For everya E V we have: k .v(a)-2nW <_ vk(a) <_ k.v(a)+2nW. Proof. Let ~rl : V1 --+ V2 be a positional optimal strategy for player I in the finite game starting at a. We show that if player I plays using the strategy 7t" 1 then the outcome of a k-step game is at least (k n) .v(a)nW. Consider a game in which player I plays according to 7rl. Push (copies of) the edges played by the players onto a stack. Whenever a cycle is formed, it follows from the fact that ~rl is an optimal strategy for player I in the finite game, that the mean weight of the cycle formed is at least v(a). The edges that participate in that cycle lie consecutively at the top of the stack. They are all removed and the process continues. Note that at each stage the stack contains at most n edges and the weight of each of them is at least W . Player I can therefore ensure that the total weight of the edges encountered in a k-step game starting from a is at least (k n) .v(a) n W . This is at least k .v (a) 2 n W as ~(a) < W. Similarly, if player II plays according to a positional optimal strategy Irz : II2 ---, 1/1 of the finite game that starts at a, she can make sure that the mean weight of each cycle closed is at most t,(a). At most n edges are left on the stack and the weight of each of them is at most W. She can therefore ensure that the total weight of the edges encountered in a k-step game starting at a is at most (k n).~(a) + n W < k.~,(a) + 2nW. [] We can now describe the algorithm for computing the exact values of the finite and infinite games. T h e o r e m 3 . Let G = (V ,E ) be a directed graph and let w : E --* { W , . . . , 0 , . . . , W } be a weight function on its edges. The value ~(a), for every a 9 V , corresponding to the infinite and finite games that start at all the vertices of V can be computed in O(IVI3.1EI.W) time. Proof. Compute the values ~k(a), for every a E V, for k = 2nZW. This can be done, according to Theorem 1, in O(IV[3.[E[.W) time. For each vertex a e V, compute the estimate v '(a) = vk(a) /k . By Theorem 2, we get that

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Complexity of Multi-Mean-Payoff and Multi-Energy Games

In mean-payoff games, the objective of the protagonist is to ensure that the limit average of an infinite sequence of numeric weights is nonnegative. In energy games, the objective is to ensure that the running sum of weights is always nonnegative. Multi-mean-payoff and multi-energy games replace individual weights by tuples, and the limit average (resp., running sum) of each coordinate must be...

متن کامل

On the computational complexity of solving stochastic mean-payoff games

We consider some well known families of two-player, zero-sum, turn-based, perfect information games that can be viewed as specical cases of Shapley’s stochastic games. We show that the following tasks are polynomial time equivalent: • Solving simple stochastic games, • solving stochastic mean-payoff games with rewards and probabilities given in unary, and • solving stochastic mean-payoff games ...

متن کامل

Faster algorithms for mean-payoff games

In this paper, we study algorithmic problems for quantitative models that are motivated by the applications in modeling embedded systems. We consider two-player games played on a weighted graph with mean-payoff objective and with energy constraints. We present a new pseudopolynomial algorithm for solving such games, improving the best known worst-case complexity for pseudopolynomial mean-payoff...

متن کامل

Deciding the Winner in Parity Games Is in UP ∩ co-UP

We observe that the problem of deciding the winner in mean payoff games is in the complexity class UP ∩ co-UP. We also show a simple reduction from parity games to mean payoff games. From this it follows that deciding the winner in parity games and the modal μ-calculus model checking are in UP ∩ co-UP.

متن کامل

The Multiple Dimensions of Mean-Payoff Games

Outline We consider quantitative game models for the design of reactive systems working in resource-constrained environment. The game is played on a finite weighted graph where some resource (e.g., battery) can be consumed or recharged along the edges of the graph. In mean-payoff games, the resource usage is computed as the long-run average resource consumption. In energy games, the resource us...

متن کامل

Reduction of stochastic parity to stochastic mean-payoff games

A stochastic graph game is played by two players on a game graph with probabilistic transitions. We consider stochastic graph games with ω-regular winning conditions specified as parity objectives, and mean-payoff (or limit-average) objectives. These games lie in NP ∩ coNP. We present a polynomial-time Turing reduction of stochastic parity games to stochastic mean-payoff games.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995